Word-level human interpretable scoring mechanism for novel text detection using Tsetlin Machines

نویسندگان

چکیده

Abstract Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation why a document is considered novel. In addition, dealing with at word level crucial provide more fine-grained analysis than what available level. this work, we propose Tsetlin Machine (TM)-based architecture for scoring individual words according their contribution novelty. Our approach encodes description novel documents using linguistic patterns captured by TM clauses. We then adapt measure how much contributes making experimental results demonstrate our breaks down into interpretable phrases, successfully measuring

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Word Detection using Text Segmentation Techniques

Literature in Molecular Biology is abundant with linguistic metaphors. There have been works in the past that attempt to draw parallels between linguistics and biology, driven by the fundamental premise that proteins have a language of their own. Since word detection is crucial to the decipherment of any unknown language, we attempt to establish a problem mapping from natural language text to p...

متن کامل

Text Genre Detection Using Common Word Frequencies

In this paper we present a method for detecting the text genre quickly and easily following an approach originally proposed in authorship attribution studies which uses as style markers the frequencies of occurrence of the most frequent words in a training corpus (Burrows, 1992). In contrast to this approach we use the frequencies of occurrence of the most frequent words of the entire written l...

متن کامل

Word and phone level acoustic confidence scoring

This paper presents a word level confidence scoring technique based on a combination of multiple features extracted from the output of a phonetic classifier. The goal of this research was to develop a robust confidence measure based strictly on acoustic information. This research focused on methods for augmenting standard log likelihood ratio techniques with additional information to improve th...

متن کامل

Interpretable support vector machines for functional data

Support Vector Machines (SVM) has been shown to be a powerful nonparametric classification technique even for high-dimensional data. Although predictive ability is important, obtaining an easy-to-interpret classifier is also crucial in many applications. Linear SVM provides a classifier based on a linear score. In the case of functional data, the coefficient function that defines such linear sc...

متن کامل

Grammatical structures for word-level sentiment detection

Existing work in fine-grained sentiment analysis focuses on sentences and phrases but ignores the contribution of individual words and their grammatical connections. This is because of a lack of both (1) annotated data at the word level and (2) algorithms that can leverage syntactic information in a principled way. We address the first need by annotating articles from the information technology...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Intelligence

سال: 2022

ISSN: ['0924-669X', '1573-7497']

DOI: https://doi.org/10.1007/s10489-022-03281-1